Combining linguistic and machine learning techniques for email summarization

نویسندگان

  • Smaranda Muresan
  • Evelyne Tzoukermann
  • Judith L. Klavans
چکیده

This paper shows that linguistic techniques along with machine learning can extract high quality noun phrases for the purpose of providing the gist or summary of email messages. We describe a set of comparative experiments using several machine learning algorithms for the task of salient noun phrase extraction. Three main conclusions can be drawn from this study: (i) the modifiers of a noun phrase can be semantically as important as the head, for the task of gisting, (ii) linguistic filtering improves the performance of machine learning algorithms, (iii) a combination of classifiers improves accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GIST-IT: Combining Linguistic and Machine Learning Techniques for Email Summarization

We present a system for the automatic extraction of salient information from email messages, thus providing the gist of their meaning. Dealing with email raises several challenges that we address in this paper: heterogeneous data in terms of length and topic. Our method combines shallow linguistic processing with machine learning to extract phrasal units that are representative of email content...

متن کامل

Extractive Automatic Summarization: Does more Linguistic Knowledge Make a Difference?

In this article we address the usefulness of linguistic-independent methods in extractive Automatic Summarization, arguing that linguistic knowledge is not only useful, but may be necessary to improve the informativeness of automatic extracts. An assessment of four diverse AS methods on Brazilian Portuguese texts is presented to support our claim. One of them is Mihalcea’s TextRank; other two a...

متن کامل

A Publicly Available Annotated Corpus for Supervised Email Summarization

Annotated email corpora are necessary for evaluation and training of machine learning summarization techniques. The scarcity of corpora has been a limiting factor for research in this field. We describe our process of creating a new annotated email thread corpus that will be made publicly available. We present the trade-offs of the different annotation methods that could be used.

متن کامل

Combining Different Summarization Techniques for Legal Text

Summarization, like other natural language processing tasks, is tackled with a range of different techniques particularly machine learning approaches, where human intuition goes into attribute selection and the choice and tuning of the learning algorithm. Such techniques tend to apply differently in different contexts, so in this paper we describe a hybrid approach in which a number of differen...

متن کامل

Flexible Summarization

Our project, initiated in 1997, approaches text summarization as a knowledge-scant task of passage selection. Several features make this task more discriminating. These features include "smart" key phrase selection that uses machine learning techniques and simple linguistic criteria; dynamic passage selection; adaptation to the type of text; and choice among several styles of summary. This pape...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001